System and method for mining patterns from relationship sequences extracted from big data

ABSTRACT

The various embodiments herein provide a system and method for mining frequent patterns in relationship space from a plurality of relationship sequences extracted from a big data The system comprises a data repository for collecting and storing the big data. An Entity Store for collecting and storing a plurality of entities from the big data, an Entity Hierarchy for representing a hierarchical structure of entities, a Relationship Store for collecting and storing relationship instances between the pluralities of entities, a Relationship Hierarchy for representing a hierarchical structure of relationship, a language/domain model for organizing entities and relationships in a hierarchical manner, a pattern query Processing Module (PQPM) for processing, a pattern query related to finding patterns in relationships and entities, and a Pattern Generation Module (PGM) to generate frequent patterns and a Frequent Pattern Display Module (FPDM) to provide a visual presentation of the mined patterns.

CROSS-REFERENCE TO RELATED APPLICATION

The present application claims priority of Indian provisionalapplication serial number 3286/C14E12012 filed on Aug. 10, 2012, andthat application is incorporated in its entirety at least by reference.

BACKGROUND

1. Technical Field

The embodiments herein generally relate to data mining and particularlyrelates to a mining patterns from structured, unstructured andsemi-structured data from heterogeneous sources. The embodiments hereinmore particularly relates to a system and method for mining patterns inrelationship sequences extracted from big data.

2. Description of the Related Art

Information explosion within and outside the organization has led toexponential increase in unstructured data, while the systems currentlyused are especially meant for processing structured data. With theadvent of big data systems such as columnar databases, map reduceframeworks such as Hadoop, it is now possible to store heterogeneousdata at one point. A big data is any one or a combination of anunstructured data source, a semi-structured data source and a structureddata source. However, making information available for analytics orderiving new perspectives from big data to enable analytics is somethingthat is not understood clearly yet.

Pattern mining in structured data is a fairly well understood problem;however, pattern mining on unstructured data is much less understood.The approaches for pattern mining in structured and unstructured dataare completely different. In both structured and unstructured patternmining, the co-occurrence of entities decides the pattern, given thatthe entities can share multiple relationships among themselves. However,the co-occurrence of entities alone does not ensure that patterns arebound to the correct context.

One of the existing prior art provides a system and method for theautomatic mining of new relationships which employs the use of“association rule mining” in discovering new relationships. The“association rule mining” technique basically uses the co-occurrence ofwords that are used to describe a relationship to find newrelationships. The objective of this prior art is to discover newrelationships between entities given that a statistical module assertsthe significance of the relationship, and a relationship does not matchexisting relationships between the pair of entities already in therelation database. The prior art adds new relationships after trying toresolve it with the existing relationships.

Another existing prior art provides a state of art method for effectivepattern discovery for text mining which follow a term based approach forclosed sequence pattern mining. This effort too examines sequences thatare formed by term occurrences. The prior art considers only the textdata and the method of extracting patterns cannot be extended to otherforms of data. Also, the prior art limits itself to mining patterns inentity space, where every term is considered as entity.

There exist many limitations in existing prior arts which explainpattern mining in relationship space. The existing systems attempt toperform pattern-mining on either structured or on unstructured data andnot on amalgamation of both. Also, the approach of existingpattern-mining is based on co-occurrence of two or more entities, andmines patterns in entity space only. These methods do not ensurecontextual resolution of entities, as same entities can co-occur indifferent contexts he existing pattern-mining approaches do not minepatterns upon resolution of both entities and relationships, althoughcertain aspects of entity resolution have been addressed. Further, manyforms of representation of relationships that occur between entities arerather complex and require expensive logical inference mechanism forrealizing a hierarchy of the relationships. In unstructured datacontext, it is important to arrive at a suitable representation ofrelationship that facilitates easy resolution of relationships.

In view of the foregoing, there is a need for a system and method formining patterns in relationship sequences extracted from big data. Thereis also a need for system and method for finding patterns based onco-occurring relationships. Further there exists a need for a system andmethod which can extract frequent patterns in relationship space fromrelationship sequences.

The abovementioned shortcomings, disadvantages and problems areaddressed herein and which will be understood by reading and studyingthe following specification.

SUMMARY

The primary object of the embodiments herein is to provide a system andmethod for mining patterns in a relationship space from a collection ofstructured, unstructured and semi-structured, data.

Another object of the embodiments herein is to provide, a system andmethod for enabling pattern extraction in relationship space by storingentities and relationships, and maintaining entity hierarchy andrelationship hierarchy respectively.

Yet another object of the embodiments herein is to provide a system andmethod for building relationship sequences from heterogeneous datasources to represent the order in which the relationships occur tofacilitate pattern mining.

Yet another object of the embodiments herein is to provide a system andmethod for extracting relevant relationship sequences from storedrelationships using entity and relationship hierarchies forpattern-mining.

Yet another object of the embodiments herein is to provide a system andmethod for generating most frequent patterns in relationship space fromrelationship sequences.

Yet another object of the embodiments herein is to provide a system andmethod for deriving new perspectives from big data to enable analyticsof the derived data.

These and other objects and advantages of the present invention willbecome readily apparent from the following detailed description taken inconjunction with the accompanying drawings.

The various embodiments herein provide a system for mining frequentpatterns in relationship space from a plurality of relationshipsequences extracted from a big data. The system comprising a datarepository for collecting and storing the big data, an entity store forcollecting and storing a plurality of entities from the big data, anentity hierarchy for representing a hierarchical structure re ofentities, a relationship store for collecting and storing relationshipinstances between the plurality of entities from the big data, arelationship hierarchy for representing a hierarchical structure ofrelationships, a language/domain model for organizing entities andrelationships in a hierarchical manner, a Pattern Query ProcessingModule (PQPM) for processing a pattern query related, to findingpatterns in relationships and entities, a Pattern Generation Module(PGM) to generate frequent patterns from one or more relationshipsequences from the data sources collected based on the pattern query anda Frequent Pattern Display Module (FPDM) to provide a visualpresentation of the mined patterns. The pattern generation moduleperforms frequent pattern mining by gathering relevant data sourcesusing the entity hierarchy and the relationship hierarchy. It generatesrelationship sequences with respect to each of the data source andextracts the most frequent patterns in the collection of relationshipsequences.

According to an embodiment herein, the big data comprises structured,unstructured and semi-structured data from heterogeneous data sources.

According to an embodiment herein, the entity store is a collection ofentities extracted from the big data. The entity store stores specificinformation with respect to each entity.

According to an embodiment herein, the entity hierarchy represents ahierarchical structure of entities resolved using Natural LanguageProcessing (NLP) techniques with a support of the Language and KnowledgeModels.

According to an embodiment herein, the relationship store is adapted tostore information related to each relationship instance.

According to an embodiment herein, the Relationship Hierarchy representsa hierarchical arrangement of relationships by resolving, therelationships through at least one of a word-sense disambiguationtechnique, syntactic resolution and semantic resolution in conjunctionwith the language/domain model for context resolution.

According to an embodiment herein, the Pattern Query Processing Module(PQPM) processes the pattern query by expanding the pattern query interms of entities after consulting with the entity store and thehierarchy of entities. The pattern query is a list comprising entitiesand relationships of the entities.

According to an embodiment herein, the Pattern Query Processing Module(PQPM) performs a query expansion of the pattern query to provide arelevant result by disambiguation and resolution of the entities in thepattern query. The disambiguation of the entities in the pattern queryis conducted by identifying explicit and implicit similar entities andignoring the dissimilar entities.

According to an embodiment herein, the Pattern Generation Module (PGM)comprises a document retriever to collect documents pertaining to theentities and relationships suggested by the query expansion, aRelationship Sequence Generator to create a relationship sequence withrespect to each of the retrieved documents, and a Frequent PatternGrowth Module (FPGM) for extracting relevant relationship sequences.

According to an embodiment herein, the Relationship Sequence Generatorbuilds the relationship sequences by treating each relationship as anitem. Each relationship sequence comprises the relationships in theorder of appearance in the data source.

According to an embodiment herein, the Frequent Pattern GrowthPattern-Mining Module (FPGM) adapts a Frequent Pattern Growth (FPG)algorithm for extracting relevant relationship sequences which considersthe relationship sequences as item-sets and extracts the most frequentitem-sets.

The embodiments herein further provide a method for mining frequentpatterns from a plurality of relationship sequences extracted from a bigdata. The method comprising, extracting a plurality of entities from thebig data, storing the extracted plurality of entities in an entitystore, extracting and storing one or more relationships among theplurality of entities, building an entity hierarchy by arranging theplurality of entities in a hierarchical manner, creating a relationshiphierarchy by arranging the relationships in a hierarchical manner,inputting a pattern query; where the pattern query is a list of entitiesand the relationship of entities, processing the pattern query to findpatterns in relationships and entities, retrieving relevant data sourcesfrom data using the entity hierarchy and the relationship hierarchybased on the pattern query, building relationship sequences with respectto one or more retrieved data sources and extracting frequent patternsfrom the relationship sequences and displaying the frequent patterns ona frequent pattern display module.

According to an embodiment herein, the big data comprises structured,unstructured and semi-structured data from heterogeneous data sourcesfor enabling data analysis on a single view.

According to an embodiment herein, generating frequent patterns amongthe relationship sequences is performed using a Frequent Pattern GrowthAlgorithm which considers the relationship sequences as item-sets andextracts the most frequent item-sets.

According to an embodiment herein, the method of extracting frequentpatterns comprises collecting data sources pertaining to one or moreentities and relationships contained in a pattern query, building arelationship sequence pertaining to each of the data source by handlingeach relationship as an item in an item-set that represents arelationship sequence, building a relationship sequence in an order therelationships appear in the document and identifying the frequentrelationship sequences,

According, to an embodiment herein, the method of processing the patternquery comprises extracting the hierarchy of the plurality of entities,expanding the pattern query in terms of entities based on the entityhierarchy and expanding the pattern query in terms of relationshipsbased on the relationship hierarchy.

According to an embodiment herein, expanding the pattern query in termsof entity comprises disambiguating the entities the pattern query,including synonyms and implied entities in the query expansion andperforating context resolution by including similar entities anddiscarding dissimilar entities.

According to an embodiment of the present invention, expanding thepattern query in terms of relationships comprises resolvingrelationships according to the context, including the relationship whichimplies context similarity, including the relationships that are impliedwithin the syntactic and semantic similarity and discarding thesemantically and syntactically dissimilar relationships.

These and the other aspects of the embodiments herein will be betterappreciated and understood when considered in conjunction with thefollowing description and the accompanying drawings. It should beunderstood, however, that the following descriptions, while indicatingpreferred embodiments and numerous specific details thereof, are givenby way of illustration and not of limitation. Many changes andmodifications may be made within the scope of the embodiments hereinwithout departing from the spirit thereof, and the embodiments hereininclude all such modifications.

BRIEF DESCRIPTION OF THE DRAWINGS

The other objects, features and advantages will occur to those skilledin the art from the following description of the preferred embodiment:and the accompanying drawings in which:

FIG. 1 is a block diagram illustrating a system for frequent patternmining in relationship space, according to an embodiment of the presentdisclosure.

FIG. 2 illustrates a flow chart of a method for performing frequentpattern mining in relationship space, according to an embodiment of thepresent disclosure.

FIG. 3 is a flow diagram illustrating a method for extracting frequentpatterns, according to an embodiment of the present disclosure.

FIG. 4 is a flow chart illustrating a method for processing the patternquery, according to an embodiment of the present disclosure.

Although the specific features of the present invention are shown insome drawings and not in others. This is done for convenience only aseach feature may be combined with any or all of the other features inaccordance with the present invention.

DETAILED DESCRIPTION OF DRAWINGS

In the following detailed description, a reference is made to theaccompanying drawings that form a part hereof, and in which the specificembodiments that may be practiced is shown by way of illustration. Theseembodiments are described in sufficient detail to enable those skilledin the art to practice the embodiments and it is to be understood thatthe logical, mechanical and other changes may be made without departingfrom the scope of the embodiments. The following detailed description istherefore not to be taken in a limiting sense.

The various embodiments herein provide a system for mining frequentpatterns in relationship space from a plurality of relationshipsequences extracted from a big data. The system comprising a datarepository for collecting and storing the big data an entity store forcollecting and storing a plurality of entities from the big data anentity hierarchy that represents a hierarchical structure of entities, arelationship store for collecting and storing relationship instancesbetween the plurality of entities from the big data, a relationshiphierarchy that represents a hierarchical structure of relationships anda language/domain model for organizing entities and relationships in ahierarchical manner. The system further comprises a Pattern QueryProcessing Module (PQPM) far processing, a pattern query related tofinding patterns in relationships and entities, a Pattern GenerationModule (PGM) to generate frequent patterns from one or more relationshipsequences from the data sources collected based on the pattern query anda Frequent Pattern Display Module (FPDM) to provide a visualpresentation of the mined patterns. The pattern generation noduleperforms frequent pattern mining by extracting relevant relationshipsequences from the relationship store using the entity hierarchy and therelationship hierarchy.

The big data comprises structured, unstructured and semi-structured datafrom heterogeneous data sources for enabling data analysis on a singleview.

The entity store is a collection of entities extracted from the bigdata. The entity store stores specific information that enables indistinguishing with one or more entities to retrieve one or moredocuments containing relevant entities corresponding to the patternquery. The entity hierarchy represents a hierarchical structure ofentities resolved using Natural Language Processing (NLP) techniqueswith the support of the language/domain models.

The relationship store is adapted to store information related to eachrelationship instance for distinguishing with one or more relationshipinstances. The Relationship Hierarchy represents a hierarchicalarrangement of relationships by resolving the relationships through atleast one of a word-sense disambiguation technique and contextresolution technique in conjunction with the language/domain model.

The Pattern Query Processing Module (PQPM) processes the pattern queryby expanding the pattern query in terms of entities alter consultingwith the entity store and the hierarchy of the entity. The pattern queryis a list comprising entities and relationships of the entities.

The Pattern Query Processing Module (PQPM) performs a context resolutionof the pattern query to provide a relevant result by disambiguation ofthe entities in the pattern query. The disambiguation of the entities inthe pattern query is conducted by considering synonyms and impliedentities obtained during expansion of pattern query where similarentities are included and dissimilar entities are excluded.

The Pattern Generation Module (PGM) comprises a document retriever tocollect documents pertaining to the entities and relationships containedin the pattern query. A Relationship Sequence Generator to create arelationship sequence with respect to each of the retrieved documents. AFrequent Pattern Growth Module (FPGM) for extracting relevantrelationship sequences.

The Relationship Sequence Generator builds the relationship sequences bytreating each relationship as an item. Each relationship sequencecomprises the relationships in the order of appearance in the document.

The Frequent Pattern Growth Module (FPGM) adapts a Frequent PatternGrowth (FPG) algorithm for extracting relevant relationship sequenceswhich considers the relationship sequences as item-sets and extracts themost frequent item-sets.

The method for mining frequent patterns from a plurality of relationshipsequences extracted from a big data. The method comprising, extracting aplurality of entities from the big data. An entity refers to conceptscomprising language unit having an independent meaning. The plurality ofentities extracted from the big data is stored in an entity store andthe extracted entities are arranged in a hierarchical manner. Similarlyone or more relationships among the plurality of entities are extractedand stored and a relationship hierarchy is created by arranging therelationships in a hierarchical manner. Further a pattern query isinputted to a pattern query recognition module which processes thepattern query to find patterns in relationships and entities, retrieverelevant data sources from data using the entity hierarchy and therelationship hierarchy based on the pattern query, build relationshipsequences with respect to one or more retrieved data sources, extractfrequent patterns from the relationship sequences and display thefrequent patterns on a frequent pattern display module. The patternquery is a list of entities and the relationship of entities. Heregenerating frequent patterns among the relationship sequences isperformed using a Frequent Pattern Growth Algorithm which considers therelationship sequences as item-sets and extracts the most frequentitem-sets.

The method of extracting frequent patterns comprises collecting datasources pertaining to one or more entities and relationships containedin a pattern query. Then relationship sequence pertaining to each of thedata source is built by handling each relationship as an item in anitem-set that represents a relationship sequence. Further relationshipsequence is built in an order the relationships appear in the documentand finally the frequent relationship sequences are identified.

Similarly the method of processing the pattern query comprisesextracting the hierarchy of the plurality of entities expanding thepattern query in terms of entities based on the entity hierarchy andexpanding the pattern query in terms of relationships based on therelationship hierarchy.

Here the pattern query in terms of entity comprises disambiguating theentities the pattern query, including synonyms and implied entities inthe query expansion and performing context resolution by includingsimilar entities and discarding dissimilar entities. Similarly,expanding the pattern query in terms of relationships comprisesresolving relationships according to the context, including therelationship which implies context similarity, including therelationships that are implied within the syntactic similarity anddiscarding the contextually and syntactically dissimilar relationships.

FIG. 1 is a block diagram illustrating a system for frequent patternmining in relationship space, according to an embodiment of the presentdisclosure. The system comprises a data repository 101, aLanguage/Domain Models 102, an entity store 103, an entity hierarchy104, a relationship store 105, a relationship hierarchy 106, aqueryinterface 107, a Pattern Query Processing Module (PQRM) 108, a PatternGeneration Module (PGM) 109 and a Frequent Pattern Display Module (FPDM)110.

The data repository 101 is adopted for collecting and storing big data.The big data is a collection of all forms of data comprising structured,semi-structured and unstructured data from heterogeneous sources and alanguage/domain model 102 to resolve and organize entities andrelationships in a hierarchy. The language/domain model 102 is used todisambiguate sense in an unstructured data. The language/domain model102 also disambiguates sense in the structured and semi-structured datacontexts from data repository 101.

The entity store 103 is a collection of entities extracted from the datarepository 101. The entity store 103 also stores certain specificinformation relating to entities that helps in distinguishing otherentities. The entity store 103 is used only to retrieve the documentscontaining the relevant entities corresponding to a pattern query 108.The entity hierarchy 104 is built using the Language/Domain Model 102.The entity hierarchy is a hierarchical structure of entities that isbuilt using Natural Language Processing (NLP) techniques with thesupport of the Language/Domain Model 102. The the Language/Domain Model102 is used to resolve and organize entities and relationships in ahierarchy. The Language/Domain Model 102 is especially used todisambiguate sense in an unstructured, it is useful to disambiguatesense in the structured and semi-structured data contexts also. Aftergeneration of the entity hierarchy, the entity hierarchy is madeavailable to a pattern query Processing Module (PQRM) 108.

The relationship store 105 includes a collection of relationshipinstances that also stores certain information specific to relationshipinstances. The relationship hierarchy 106 is a hierarchical arrangementof relationships that are contextually resolved by word-sensedisambiguation with the help of the Language/Domain Model 102. Therelationship store 105 and the relationship hierarchy 106 functions inconjunction with the Pattern Query Processing Module (PQRM) 108.

The Pattern Query Processing Module (PQPM) 108 receives a pattern queryinputted through a query interface 107 and performs processing as perthe required information. The pattern query comprises a list of entitiesand relationships. The PQPM 108 consults the entity store 103 and theentity hierarchy and expands the pattern query in terms of entities.This entity expansion process involves disambiguating the entities inthe pattern query, including the synonyms and implied entities in queryexpansion, making a context resolution to include the similar andexclude the dissimilar entities.

The Pattern Generation Module (PGM) 109 comprises a Document Retriever109 a, a Relationship Sequence Generator 109 b and a Frequent PatternGrowth Pattern Mining Module (FPGMM) 109 c. The document retriever 109 acollects all documents pertaining to the entities/relationshipscontained in the pattern query. The Relationship Sequence Generator 109b generates a relationship sequence with respect to each of document ordata by treating each relationship as an item. The Relationship SequenceGenerator 109 b builds a relationship sequence in the order ofappearance in the document. The Frequent Pattern Growth Pattern-miningmodule (FPGMM) module uses a Frequent Pattern Growth algorithm (FPG) forprocessing the pattern query. The FPG algorithm treats the relationshipsequences like item-sets and extracts the most frequentitem-sets/relationship sequences. The Frequent Pattern Display Module(FPDM) 110 provides for in visualizing the most frequent patternsextracted from relationship sequences in conjunction with the entity.

FIG. 2 illustrates a flow chart of a method for performing frequentpattern mining in relationship space, according to an embodiment of thepresent disclosure. The method comprises frequent pattern mining inrelationship space. In particular, the method comprises processing ofbig data for recognizing plurality of entities. The plurality ofentities are then extracted and stored in an entity store. The entitystore, stores meaningful entities extracted out of big data irrespectiveof the form from which the entity originates (201). Entities are objectsthat make independent sense. Entities are a named and unnamed objectwhich includes names of living and non living things, concepts, theoriesor simply the language units that make independent sense. Entities isany one of named entities such as names of places, people etc., orconcepts that is represented by one or more terms (example, “Purchasepower’, ‘Purchase’ as noun and ‘Purchase’ as verb is three differentconcepts). In brief, the entity refers to named entities and concepts(language unit with independent meaning). An entity hierarchy is thenbuilt by arranging, the plurality of entities in a hierarchical manner(202). Further a set of relationships among a plurality of entities isextracted and stored in a relationship store (203), and a relationshiphierarchy is created by arranging the relationships in a hierarchicalmanner (204).

The method involves the use of the entity hierarchy arid therelationship hierarchy during response to the pattern query. In case ofa pattern query, the pattern query is inputted to a Pattern QueryProcessing Module (PQPM) for finding frequent patterns related toentities and relationships in the query (205).

The document collector collects the documents that are relevant to thepattern query (206). Based on the contents of the pattern query, theRelationship Sequence Generator generates a relationship sequence foreach of the retrieved document (207). The PGM adopts a Frequent PatternGrowth Module (FPGM) for identifying the frequent patterns among therelationship sequences (208). Finally, the identified patterns aredisplayed on a Frequent Pattern Display Module (FPDM) (209).

FIG. 3 is a flow diagram illustrating a method for extracting frequentpatterns, according to an embodiment of the present disclosure. Themethod comprises receiving a pattern query in a pattern query ProcessingModule (PQPM). The PQPM processes the pattern query and communicateswith a Pattern Generation Module (PGM). The PGM comprises three subunitsas Document Retriever, a Relationship Sequence Generator and a FrequentPattern Growth Module (FPGM). Once the PGM receives the command from thePQPM, the document retriever starts collecting, one or more documents(301). The one or more documents are related to the one or more entitiesand relationships contained in the pattern query. Once the relateddocuments are collected, the Relationship Sequence Generator builds arelationship sequence in an order in which the relationships appear inthe document (302). The relationship sequences that appear like“item-sets” enable frequent item set mining. The item-sets compriserelationship sequences in an orderly manner for easy processing. Once anordered item-set is built, the Frequent Pattern Growth Module (FPGM)mines for the required pattern as desired by the pattern query (303).The result of the frequent relationships sequences are then displayed byFrequent Pattern Display Module (FPDM).

FIG. 4 is a flow chart illustrating a method for processing the patternquery, according to an embodiment of the present disclosure. The patternquery is raised by a user which is inputted to a Pattern QueryProcessing Module (PQPM). Depending on the content of the pattern query,the PQPM expands the pattern query in terms of entities on referring theentity list and the entity hierarchy (401). Expanding the pattern queryin terms of entity includes steps of disambiguating the entities in thepattern query, including synonyms and implied entities in the queryexpansion and performing context resolution by including similarentities and discarding dissimilar entities. The PQPM then expands thepattern query in terms of relationships based on the relationshiphierarchy (402). Here expanding the pattern query in terms ofrelationship includes resolving relationships according to the context,including the relationships which implies context similarity, includingthe relationships that are implied within the syntactic similarity anddiscarding the contextually and syntactically dissimilar relationships.

The embodiments of the present invention disclose an approach that looksfor patterns in the relationship space. The embodiments of the presentdisclosure, provides a robust approach to find patterns and ensurescontext resolution effectively. The entities and relationships among theentities assist in understanding the big data. All the entities andrelationships are derived and collected. This collection of entities andrelationships serves as input to all intelligent processing of data.Data mining and data analysis applications, forecasting, predictiveanalytics applications and machine learning applications make use of thepatterns to learn further insights. The embodiments herein enable anenterprise that intends to facilitate processing of big data and buildapplications on top. The embodiment herein also allows building ofdomain specific, niche applications that harness big data. Theembodiments herein provides immense benefit to following sectors but isnot limited to retail, health and pharmaceutical services, banking andinsurance.

The foregoing description of the specific embodiments will so fullyreveal the general nature of the embodiments herein that others can, byapplying current knowledge, readily modify and/or adapt for variousapplications such specific embodiments without departing from thegeneric concept, and, therefore, such adaptations and modificationsshould and are intended to he comprehended within the meaning and rangeof equivalents of the disclosed embodiments. It is to be understood thatthe phraseology or terminology employed herein is for the purpose ofdescription and not of limitation. Therefore, while the embodimentsherein have been described in terms of preferred embodiments, thoseskilled in the art will recognize that the embodiments herein can bepracticed with modification.

What is claimed is:
 1. A system for mining frequent patterns inrelationship space from a plurality of relationship sequences extractedfrom a big data, the system comprising: a data repository for collectingand storing the big data; an entity store for collecting and storing aplurality of entities from the big data; an entity hierarchy forrepresenting a hierarchical structure of entities; a relationship storefor collecting and storing relationship instances between the pluralityof entities from the big data; a relationship hierarchy for representinga hierarchical structure of relationships; a language/domain model fororganizing entities and relationships in a hierarchical manner; apattern query Processing Module (PQPM) for expanding a pattern queryrelated to finding patterns in relationships and entities; a PatternGeneration Module (PGM) to generate frequent patterns from one or morerelationship sequences from the data sources collected based on thepattern query; and a Frequent Pattern Display Module (FPDM) to provide avisual presentation of the mined patterns; where the pattern generationmodule performs frequent pattern mining by extracting relevantrelationship sequences from the relationship store using the entityhierarchy and the relationship hierarchy.
 2. The system according toclaim 1, wherein the big data comprises structured, unstructured andsemi-structured data from heterogeneous data sources.
 3. The systemaccording to claim 1, wherein the entity store is a collection ofentities extracted from the big data, wherein the entity store storesinformation specific to each entity.
 4. The system according to claim 1,wherein the entity hierarchy is a hierarchical structure of entitiesresolved using Natural Language Processing (NLP) techniques with asupport of the language/domain model.
 5. The system according to claim1, wherein the relationship store is adapted to store informationrelated to each relationship instance.
 6. The system according to claim1, wherein the Relationship Hierarchy provides a hierarchicalarrangement of relationships by resolving the relationships through atleast one of a word-sense disambiguation technique, syntactic andsemantic similarity and context resolution technique in conjunction withthe language/domain model.
 7. The system according to claim 1, whereinthe pattern query Processing Module (PQPM) processes the pattern queryby expanding, the pattern query in terms of entities after consultingthe entity hierarchy, wherein the pattern query is a list comprisingentities and relationships of the entities.
 8. The system according toclaim 1, wherein the pattern query Processing Module (PQPM) performs aexpansion of the pattern query to provide a relevant result bydisambiguation of the entities in the pattern query, where thedisambiguation of the entities in the pattern query is conducted byidentifying explicit and implicit similar entities and ignoring thedissimilar entities
 9. The system according to claim 1, the PatternGeneration Module (PGM) comprises: a document retriever to collectdocuments pertaining to the entities and relationships suggested by thequery expansion; a Relationship Sequence Generator to create arelationship sequence with respect to each of the retrieved documents; aFrequent Pattern Growth Module (FPGM) for extracting relevantrelationship sequences.
 10. The system according to claim 9, wherein theRelationship Sequence Generator builds the relationship sequences bytreating each relationship as an item, where each relationship sequencecomprises the relationships in the order of appearance in the document.11. The system according to claim 9, wherein the Frequent Pattern GrowthPattern-Mining Module (FPGM) adapts a Frequent Pattern Growth (FPG)algorithm for extracting relevant relationship sequences which considersthe relationship sequences as item-sets and extracts the most frequentitem-sets.
 12. A method for mining frequent patterns from a plurality ofrelationship sequences extracted from a big data, the method comprising:extracting a plurality of entities from the big data, where an entityrefers to concepts comprising language unit having an independentmeaning; storing the extracted plurality of entities in an entity store;extracting and storing one or more relationships among the plurality ofentities; building an entity hierarchy by arranging the plurality ofentities in a hierarchical manner; creating a relationship hierarchy byarranging the relationships in a hierarchical manner; inputting apattern query, where the pattern query is a list of entities and therelationship of entities; expanding the pattern query to include mostrelevant entities and relationships and ignore irrelevant patterns andrelationships; retrieving relevant data sources from data using thepattern query; building relationship sequences with respect to one ormore retrieved data sources; extracting frequent patterns from therelationship sequences; and displaying the frequent patterns on afrequent pattern display module.
 13. The method according to claim 12,wherein the big data comprises structured, unstructured andsemi-structured data from heterogeneous data sources for enabling dataanalysis on a single view.
 14. The method according to claim 12, whereingenerating frequent patterns among the relationship sequences isperformed using a Frequent Pattern Growth Algorithm which considers therelationship sequences as item-sets and extracts the most frequentitem-sets.
 15. The method according to claim 12, wherein the method ofextracting frequent patterns comprises: collecting data sourcespertaining to one or more entities and relationships contained in apattern query; building a relationship sequence pertaining to each ofthe data source by handling each relationship as an item in an item-setthat represents a relationship sequence; building a relationshipsequence in an order the relationships appear in the document; andidentifying the frequent relationship sequences.
 16. The methodaccording to claim 12, wherein the method of processing the patternquery comprises: extracting the hierarchy of the plurality of entities;expanding the pattern query in terms of entities based on the entityhierarchy; and expanding the pattern query m terms of relationshipsbased on the relationship hierarchy.
 17. The method according to claim16, expanding the pattern query in terms of entity comprises:disambiguating the entities the pattern query; including synonyms andimplied entities in the query expansion; and discarding dissimilarentities.
 18. The method according to claim 16, expanding the patternquery in terms of relationships comprises: resolving relationshipsaccording, to the context; including the relationships which impliescontext similarity; including the relationships that are implied withinthe syntactic similarity; and discarding the contextually andsyntactically dissimilar relationships.